Exploration is about discovery and looking for insights and themes. One data aggregation approach is based on some “linear model” assumptions, but begins in the exploratory phase. This is Exploratory Factor Analysis (EFA).
Generally, EFA intends to aggregate features of the data. This is common in the social sciences (psychology) where you ask a participant a lot of questions and you want to see if a group of these questions have similar responses.
For this introduction we will use the personality-raw data which is part of the latest version [0.1.4] of library(humanVerseWSU);
packageVersion("humanVerseWSU"); # ‘0.1.4’ [SHOULD BE THIS]
## [1] '0.1.4.2'
library(humanVerseWSU);
# You need R tools for this to work: https://cran.r-project.org/bin/windows/Rtools/
# You may want to see if you have the latest version...
# library(devtools);
# detach(package:humanVerseWSU);
# install_github("MonteShaffer/humanVerseWSU/humanVerseWSU");
# Choose (3) None to minimize headaches ....
# library(humanVerseWSU);
## this code was in correlation notebook, repeating here ...
personality.raw = readRDS( system.file("extdata", "personality-raw.rds", package="humanVerseWSU") );
cleanupPersonalityDataFrame = function(personality.raw)
{
df = removeColumnsFromDataFrame(personality.raw, "V00");
dim(df); # 838
ywd.cols = c("year","week","day");
ywd = convertDateStringToFormat( df$date_test,
c("%Y","%W","%j"),
ywd.cols,
"%m/%d/%Y %H:%M"
);
ndf = replaceDateStringWithDateColumns(df,"date_test",ywd);
ndf = sortDataFrameByNumericColumns(ndf, ywd.cols, "DESC");
ndf = removeDuplicatesFromDataFrame(ndf, "md5_email");
dim(ndf); # 678
ndf;
}
personality.clean = cleanupPersonalityDataFrame(personality.raw);
### let's examine the data in total
personality.Vs = removeColumnsFromDataFrame(personality.clean,c("md5_email","year","week","day"));
X = personality.Vs;
Xs = scale(X); # this is a good habit to get into, even if the data units are similar (as is the case here).
# because there are subject-level biases (e.g., a tendency for a user to answer questions higher/lower, maybe a "within-subject" scaling [row-level] followed by "within-item" scaling [col-level] may be appropriate. Or maybe it removes some of the findings and their meaning?)
Xs = as.data.frame(Xs);
Xs;
Remember, during “exploration” there are really not any significant constraints. In the previous notebook, I showed a few “normality” overlays of the PCA graphs for the countries (the elliptical forms).
Some call these “diagnostic tests”
“The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is a statistic that indicates the proportion of variance in your variables that might be caused by underlying factors. High values (close to 1.0) generally indicate that a factor analysis may be useful with your data. If the value is less than 0.50, the results of the factor analysis probably won’t be very useful.”
https://www.ibm.com/support/knowledgecenter/SSLVMB_23.0.0/spss/tutorials/fac_telco_kmo_01.html
# this is the standard correlation matrix
Xs.corr = cor(Xs);
# library(KMO); # install.packages("KMO", dependencies=TRUE); # not available for R == 4.0.2
library(REdaS); # install.packages("REdaS", dependencies=TRUE);
# https://www.rdocumentation.org/packages/REdaS/versions/0.9.3/topics/Kaiser-Meyer-Olkin-Statistics
Xs.KMO = KMOS(Xs);
str(Xs.KMO);
## List of 7
## $ call : language KMOS(x = Xs)
## $ cormat : num [1:60, 1:60] 1 0.3533 -0.1868 0.0734 0.2002 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:60] "V01" "V02" "V03" "V04" ...
## .. ..$ : chr [1:60] "V01" "V02" "V03" "V04" ...
## $ pcormat: num [1:60, 1:60] 1.7222 0.0544 0.0738 -0.0578 0.0135 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:60] "V01" "V02" "V03" "V04" ...
## .. ..$ : chr [1:60] "V01" "V02" "V03" "V04" ...
## $ n : int 678
## $ k : int 60
## $ MSA : Named num [1:60] 0.923 0.92 0.824 0.93 0.947 ...
## ..- attr(*, "names")= chr [1:60] "V01" "V02" "V03" "V04" ...
## $ KMO : num 0.941
## - attr(*, "class")= chr "MSA_KMO"
my.kmo = Xs.KMO$KMO;
my.kmo;
## [1] 0.9408998
if(my.kmo >= 0.90)
{
print("marvelous!");
} else if(my.kmo >= 0.80)
{
print("meritorious!");
} else if(my.kmo >= 0.70)
{
print("middling!");
} else if(my.kmo >= 0.60)
{
print("mediocre!");
} else if(my.kmo >= 0.50)
{
print("miserable!");
} else {
print("mayhem!");
print("Oh snap!");
print("Kaiser-Meyer-Olkin (KMO) Test is a measure of how suited your data is for Factor Analysis. The test measures sampling adequacy for each variable in the model and for the complete model. The statistic is a measure of the proportion of variance among variables that might be common variance. The lower the proportion, the more suited your data is to Factor Analysis. <https://www.statisticshowto.com/kaiser-meyer-olkin/>");
}
## [1] "marvelous!"
“Bartlett’s test of sphericity tests the hypothesis that your correlation matrix is an identity matrix, which would indicate that your variables are unrelated and therefore unsuitable for structure detection. Small values (less than 0.05) of the significance level indicate that a factor analysis may be useful with your data.”
https://www.ibm.com/support/knowledgecenter/SSLVMB_23.0.0/spss/tutorials/fac_telco_kmo_01.html
# https://www.statology.org/bartletts-test-of-sphericity/
# Bartlett’s Test of Sphericity is not the same as Bartlett’s Test for Equality of Variances.
# this is the standard correlation matrix
Xs.corr = cor(Xs);
library(psych); # install.packages("psych", dependencies=TRUE);
Xs.bartlett = cortest.bartlett(Xs.corr, n = nrow(Xs));
str(Xs.bartlett);
## List of 3
## $ chisq : num 18349
## $ p.value: num 0
## $ df : num 1770
# alpha level
alpha = 0.05;
if(Xs.bartlett$p.value < alpha)
{
print(paste0("Bartlett's test of sphericity ... pvalue < alpha ... ", Xs.bartlett$p.value , " < ", alpha, " ... \n CONCLUSION: we believe this data is likely suitable for factor analysis or PCA"));
} else {
print("Oh snap!");
print("To put this in layman's terms, the variables in our dataset are fairly uncorrelated so a data reduction technique like PCA or factor analysis would have a hard time compressing these variables into linear combinations that are able to capture significant variance present in the data. <https://www.statology.org/bartletts-test-of-sphericity/>");
}
## [1] "Bartlett's test of sphericity ... pvalue < alpha ... 0 < 0.05 ... \n CONCLUSION: we believe this data is likely suitable for factor analysis or PCA"
# will be available in humanVerseWSU ... 0.1.4.1 (coming soon) ...
# performBartlettSphericityTest(Xs);
# performBartlettSphericityTest(Xs.corr, n = nrow(Xs));
# myMeasure ...
When you watch some YouTube videos or listen to the machine-learning crowd, they will suggest these tests are a waste of time. You can believe that crowd if you want, but it may be a waste of time to perform analysis on data that isn’t designed to be aggregated in this form.
Note: “Unsupervised learning” simply means “requires no human interaction.” I can program many of these data aggregation techniques to be completely automated (including developing a programmatic strategy to ascertain the idea number of factors): ‘I want my “AI algorithm” to run, so I don’t want to perform a test and stop without executing. I have so much data, at least show them something.’ I emphasize this represents the “other guys”, not good data analysts.
In this section, I will review various approaches to attempt to arrive at a conclusion of the number of factors (without merely looking at a scree plot like we did in kmeans (wss).
# pick a maximum number to examine
maxFactors = 8; # I have 60 variables
Xs.vss = vss(Xs, n = maxFactors); # could also input Xs.corr, but you would want to include n.obs = nrow(Xs)
#str(Xs.vss);
Xs.vss.dataframe = cbind(1:maxFactors,Xs.vss$vss.stats[,c(1:3)]);
colnames(Xs.vss.dataframe) = c("Factors","d.f.","chisq","p.value");
Xs.vss.dataframe;
# we have so much data, it is saying we could use many different factors ...
# the choice for "4" seems to mean the "gain of a new factor" is minimal.
# https://personality-project.org/r/vss.html
# `__student_access__\sample_latex_files\Multivariate-2009\MonteShaffer_Stats519_HW5.pdf` # See HW5, pg 11 of a custom `multifactanal` table I constructed to use multiple criteria to assess optimal factors ...
Xs.corr.eigen = eigen(Xs.corr);
library(nFactors); # install.packages("nFactors", dependencies=TRUE);
# Basic
plotuScree(Xs.corr.eigen$values);
abline(h = 1, col="blue");
Xs.corr.eigen$values[Xs.corr.eigen$values > 1];
## [1] 14.217853 6.021335 3.009303 2.306947 2.023039 1.587564 1.278967
## [8] 1.193839 1.146214 1.073433 1.045484
# technically 11 are greater than 1.
# 5-6 would seem reasonable based on what we see
# Steroids
nResults = nScree(eig = Xs.corr.eigen$values,
aparallel = parallel(
subject = nrow(Xs),
var = ncol(Xs) )$eigen$qevpea);
plotnScree(nResults, main="Component Retention Analysis");
# This is suggesting "6" based on Parallel Analysis and Optimal Coordinates ...
str(nResults);
## List of 3
## $ Components:'data.frame': 1 obs. of 4 variables:
## ..$ noc : num 6
## ..$ naf : num 1
## ..$ nparallel: int 6
## ..$ nkaiser : int 11
## $ Analysis :'data.frame': 60 obs. of 8 variables:
## ..$ Eigenvalues : num [1:60] 14.22 6.02 3.01 2.31 2.02 ...
## ..$ Prop : num [1:60] 0.237 0.1004 0.0502 0.0384 0.0337 ...
## ..$ Cumu : num [1:60] 0.237 0.337 0.387 0.426 0.46 ...
## ..$ Par.Analysis: num [1:60] 1.58 1.55 1.51 1.47 1.44 ...
## ..$ Pred.eig : num [1:60] 6.12 3.06 2.34 2.06 1.61 ...
## ..$ OC : chr [1:60] "" "" "" "" ...
## ..$ Acc.factor : num [1:60] NA 5.184 2.31 0.418 -0.152 ...
## ..$ AF : chr [1:60] "(< AF)" "" "" "" ...
## $ Model : chr "components"
## - attr(*, "class")= chr "nScree"
# howManyFactorsToSelect(Xs);
# I could loop over the data like I did in the `kmeans` notebook, but we can now consider functions that already do that ...
library(psych); # install.packages("psych", dependencies=TRUE);
library(GPArotation); # install.packages("GPArotation", dependencies=TRUE);
Xs.parallel = fa.parallel(Xs, fm = "minres", fa = "fa");
## Parallel analysis suggests that the number of factors = 7 and the number of components = NA
str(Xs.parallel);
## List of 10
## $ fa.values: num [1:60] 13.5 5.2 2.22 1.51 1.18 ...
## $ pc.values: num [1:60] 14.22 6.02 3.01 2.31 2.02 ...
## $ pc.sim : logi NA
## $ pc.simr : logi NA
## $ fa.sim : num [1:60] 0.679 0.595 0.545 0.512 0.477 ...
## $ fa.simr : num [1:60] 0.649 0.584 0.543 0.513 0.484 ...
## $ nfact : num 7
## $ ncomp : logi NA
## $ Call : language fa.parallel(x = Xs, fm = "minres", fa = "fa")
## $ values : num [1:20, 1:240] 1.62 1.63 1.6 1.68 1.65 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:240] "C1" "C2" "C3" "C4" ...
## - attr(*, "class")= chr [1:2] "psych" "parallel"
# This is suggesting between 5-6
round( psych::describe(Xs), digits=5);
Xs.factanal.5 = factanal(Xs, factors=5, rotation='none');
# this uses "mle" method ...
# ## rotation## #
# varimax = assumes data is independent
# promax = does a transform
# none = nothing
# Xs.factanal.5 = factanal(covmat=Xs.corr, n.obs=nrow(Xs), factors=5, rotation='varimax');
# Uniqueness
head(Xs.factanal.5$uniquenesses);
## V01 V02 V03 V04 V05 V06
## 0.6868042 0.4506084 0.6030061 0.7272787 0.5076128 0.5700217
# "Uniqueness" shows what???
# Map of Variables to Factors (Loadings)
print(Xs.factanal.5$loadings, digits=2, cutoff=0.25, sort=FALSE);
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4 Factor5
## V01 0.41 -0.34
## V02 0.48 -0.43 -0.30
## V03 0.42 0.28
## V04 0.41 0.29
## V05 0.53 -0.35
## V06 0.39 -0.37 0.35
## V07 0.33
## V08 0.26
## V09 0.59 -0.30
## V10 0.56 -0.32
## V11 0.58 0.35
## V12 0.42 0.37 -0.26
## V13 0.43 0.48
## V14 0.59 -0.48
## V15 0.46
## V16 0.64 -0.34
## V17 0.58 -0.38 0.30
## V18 0.53
## V19 0.49 0.32
## V20 0.36
## V21 0.64
## V22 0.45
## V23 0.55 -0.31
## V24 0.26
## V25 0.29 -0.26 0.36
## V26 0.52 0.33
## V27 0.40 -0.26
## V28 0.29 -0.30 0.46
## V29 0.56 0.32
## V30 0.47 -0.33
## V31 0.47 0.38
## V32 0.45
## V33 0.49 -0.28
## V34 0.60 0.32
## V35 0.33
## V36 0.61
## V37 0.62
## V38 0.57
## V39 0.53
## V40 0.60
## V41 0.32 0.49
## V42 0.56 0.26
## V43 0.56
## V44 0.57
## V45 0.61
## V46 0.51 -0.25
## V47 0.57 -0.33
## V48 0.29 -0.27
## V49 0.57 0.39
## V50 0.45 0.28
## V51 0.52 0.25
## V52 0.52
## V53 0.64 -0.36 -0.25
## V54 0.67
## V55 0.33
## V56 0.56
## V57 0.64
## V58 0.61
## V59 0.48 0.46
## V60 0.52 -0.41
##
## Factor1 Factor2 Factor3 Factor4 Factor5
## SS loadings 13.62 5.49 2.44 1.72 1.44
## Proportion Var 0.23 0.09 0.04 0.03 0.02
## Cumulative Var 0.23 0.32 0.36 0.39 0.41
plot(Xs.factanal.5$loadings[,1:2], type="n");
text(Xs.factanal.5$loadings[,1:2],labels=names(Xs),cex=.7) # add variable names
print("Cool 3D graphs start here");
## [1] "Cool 3D graphs start here"
library(scatterplot3d); # install.packages("scatterplot3d", dependencies=TRUE);
library(rgl); # install.packages("rgl", dependencies=TRUE);
pchs = numeric(ncol(Xs));
pchs[1:30] = 16; # self
pchs[31:60] = 17; # other
# https://www.sessions.edu/color-calculator/
c.choices = c("steelblue", "#b46e46");
colors = character(ncol(Xs));
colors[1:30] = c.choices[1]; # self
colors[31:60] = c.choices[2]; # other
Xs.sp3d = scatterplot3d(Xs.factanal.5$loadings[,1:3],
pch=pchs, color=colors,
grid=TRUE, box=TRUE,
type="p",
angle=22
);
legend(Xs.sp3d$xyz.convert(1.5, -0.75, 1),
legend = c("Self","Other"),
col = c.choices,
text.col = c.choices,
pch = 16:17,
bty = 'n'
);
# radius
rs = numeric(ncol(Xs));
rs[1:30] = 0.05; # self
rs[31:60] = 0.03; # other
# http://www.sthda.com/english/wiki/a-complete-guide-to-3d-visualization-device-system-in-r-r-software-and-data-visualization#basic-graph
print("Uncomment here to get more 3D in dynamic interactive form [WILL NOT KNIT]");
## [1] "Uncomment here to get more 3D in dynamic interactive form [WILL NOT KNIT]"
## uncomment, it will not KNIT
# rgl.open(); ## Open a new RGL device
# rgl.bg(color = "white");
# rgl.spheres(Xs.factanal.5$loadings[,1:3],
# r = rs,
# color = colors
# );
# ## right click on a sphere
# identify3d(Xs.factanal.5$loadings[,1:3], labels = names(Xs), n = 5);
### alternatively
## uncomment, it will not KNIT
# plot3d(Xs.factanal.5$loadings[,1:3],
# col=colors, box = FALSE,
# type ="s", radius = rs
# );
## TODO ... create a movie and store in directory ... create a PNG ...
## how about the raw data ...
round( psych::describe(X), digits=5);
X.factanal.5 = factanal(X, factors=5, rotation='none');
head(X.factanal.5$uniquenesses);
## V01 V02 V03 V04 V05 V06
## 0.6868042 0.4506084 0.6030061 0.7272787 0.5076128 0.5700217
print(X.factanal.5$loadings, digits=2, cutoff=0.25, sort=FALSE);
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4 Factor5
## V01 0.41 -0.34
## V02 0.48 -0.43 -0.30
## V03 0.42 0.28
## V04 0.41 0.29
## V05 0.53 -0.35
## V06 0.39 -0.37 0.35
## V07 0.33
## V08 0.26
## V09 0.59 -0.30
## V10 0.56 -0.32
## V11 0.58 0.35
## V12 0.42 0.37 -0.26
## V13 0.43 0.48
## V14 0.59 -0.48
## V15 0.46
## V16 0.64 -0.34
## V17 0.58 -0.38 0.30
## V18 0.53
## V19 0.49 0.32
## V20 0.36
## V21 0.64
## V22 0.45
## V23 0.55 -0.31
## V24 0.26
## V25 0.29 -0.26 0.36
## V26 0.52 0.33
## V27 0.40 -0.26
## V28 0.29 -0.30 0.46
## V29 0.56 0.32
## V30 0.47 -0.33
## V31 0.47 0.38
## V32 0.45
## V33 0.49 -0.28
## V34 0.60 0.32
## V35 0.33
## V36 0.61
## V37 0.62
## V38 0.57
## V39 0.53
## V40 0.60
## V41 0.32 0.49
## V42 0.56 0.26
## V43 0.56
## V44 0.57
## V45 0.61
## V46 0.51 -0.25
## V47 0.57 -0.33
## V48 0.29 -0.27
## V49 0.57 0.39
## V50 0.45 0.28
## V51 0.52 0.25
## V52 0.52
## V53 0.64 -0.36 -0.25
## V54 0.67
## V55 0.33
## V56 0.56
## V57 0.64
## V58 0.61
## V59 0.48 0.46
## V60 0.52 -0.41
##
## Factor1 Factor2 Factor3 Factor4 Factor5
## SS loadings 13.62 5.49 2.44 1.72 1.44
## Proportion Var 0.23 0.09 0.04 0.03 0.02
## Cumulative Var 0.23 0.32 0.36 0.39 0.41
plot(X.factanal.5$loadings[,1:2], type="n");
text(X.factanal.5$loadings[,1:2],labels=names(X),cex=.7) # add variable names
## notice any significant differences? why or why not? will this always be the case? why is it the case in this situation?
The above was to help you understand the key aspects of EFA. Now we want to actually apply it correctly based on the data constraints we know. Normally, I would take the Factor Loadings and try to understand them in relationship to the feature (V01).
The loadings can be interpreted as correlations, so if the feature was “Happy” and it’s value for Factor01 was 0.738, that means “Happy” is part of Factor01. If the same feature was also linked to Factor04 with a value of -0.35, I could conclude that “Not Happy” is part of Factor04. (**See* MonteShaffer_Stats519_HW5.pdf Problems 2 and 3.)
The Personality data has two sections:
Natural Self (Questions 1-30): How do you FEEL YOU REALLY ARE?
Environmental Forces (Questions 31-60): How do you FEEL OTHERS EXPECT YOU TO ACT?
We need to do EFA on these two groups of features: SELF (1-30) and OTHER (31-60)
This summer I updated the old test (which is the data we are analyzing). A beta-site of this new adminstration of the test can be found here: http://mpt.mshaffer.com/
If interested, you can take the test for FREE and see some of the business-propositions I am developing as well. The payment processing piece (BUY NOW with a CREDIT CARD) is not complete, and I hope to finish that over the Winter holiday.
You can take it, and you can invite your family and friends to take it. All I would ask is that you have your family/friends use your WSU email (e.g., YOURNAME@wsu.edu) even though they should select their own gender, name, and age. If they do (use your email), when the test is complete, you will receive an email with their results.
Below is some example links. The email and p-code would change for your report. This is my latest report.
Initial (partial report): http://mpt.mshaffer.com/report/mshaffer@mshaffer.com/p-5f7d9a7fedd5f/
Initial (partial report with exploding time offer): http://mpt.mshaffer.com/report/mshaffer@mshaffer.com/p-5f7d9a7fedd5f/?has-paid=33
Full Report (as-if you paid): http://mpt.mshaffer.com/report/mshaffer@mshaffer.com/p-5f7d9a7fedd5f/?has-paid=true
As you can see from the above, I just took the test and spent less than 4 minutes on it. So the accuracy is probably not perfect, but it is somewhat accurate. I would suggest you take the test when you are in a “reflective mood” and devote at least 15 minutes to it.
I would say my natural self is a D-4 or D-6. This is a version of an older report from 2018 http://www.mshaffer.com/arizona/Monte_Shaffer__RleeFtjanggAAOn6m00.pdf, that was true at the time (I was working with a lot of young CS interns/hires working on an ECE project https://nsf.gov/awardsearch/showAward?AWD_ID=1819997 to develop technologies to automate exercise routines for persons with Parkinson’s Disease).
The objective is to assess how many factors to extra for the data. Once determined, a summary of the factors and their relationship to the original words on the personality test.
–DO SOMETHING HERE–
Xs.self = Xs[,1:30];
Xs.self;
library(devtools);
source_url("https://raw.githubusercontent.com/MonteShaffer/humanVerseWSU/master/humanVerseWSU/R/functions-EDA.R");
Xs.self.how.many = howManyFactorsToSelect(Xs.self);
## [1] " Paralell Analysis"
## Parallel analysis suggests that the number of factors = 5 and the number of components = NA
## [1] "============================================="
## [1] " VSS Analysis"
## [1] "************************"
## [1] " Eigenvalues >= 1 ... [ n = 6 ]"
## [1] 7.152773 3.893260 2.050145 1.667027 1.189494 1.038370
## [1] "************************"
## [1] "A 4-Factor solution has the most votes!"
## [1] ""
## [1] "Due to Optimal Coordinantes and Parallel Analysis Agreement,"
## [1] "A 4-Factor solution is *strongly* recommended!"
## [1] "************************"
## [1] " Final Analysis of VSS, Eigen, nFactors"
## Factor vote.count
## 1 1 1
## 2 2 2
## 3 3 1
## 4 4 3
## 5 5 1
## 6 6 2
## [1] ""
Xs.self.EFA.factanal = perform.EFA(Xs.self, 6, which="factanal",
rotation="varimax", scores = "regression");
## [1] " KMO test has score: 0.903778669389676 --> marvelous!"
## [1] " Bartlett Test of Sphericity --> Bartlett's test of sphericity ... pvalue < alpha ... 0 < 0.05 ... \n CONCLUSION: we believe this data is likely suitable for factor analysis or PCA"
## [1] " Using stats::factanal ... "
## [1] " Overview "
##
## Call:
## stats::factanal(x = X, factors = n.factors, scores = scores, rotation = rotation)
##
## Uniquenesses:
## V01 V02 V03 V04 V05 V06 V07 V08 V09 V10 V11 V12 V13
## 0.530 0.366 0.462 0.677 0.506 0.512 0.739 0.863 0.460 0.574 0.392 0.487 0.549
## V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26
## 0.277 0.638 0.359 0.333 0.581 0.615 0.569 0.477 0.613 0.427 0.624 0.641 0.522
## V27 V28 V29 V30
## 0.732 0.595 0.504 0.603
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
## V01 0.205 0.332 0.230 0.503
## V02 0.260 0.729 0.173
## V03 -0.679 -0.104 0.147 0.164 -0.103
## V04 0.166 0.530
## V05 0.588 0.239 -0.183 0.230
## V06 0.698
## V07 0.117 0.448 0.149 0.125
## V08 0.211 0.255 -0.142
## V09 0.530 0.148 0.278 -0.174 0.354
## V10 0.309 0.473 0.264 0.160 0.100
## V11 -0.126 0.170 0.724 -0.114 0.162
## V12 0.314 0.363 0.116 0.119 0.502
## V13 0.340 0.475 0.268 0.193
## V14 0.233 0.771 0.147 0.129 0.189
## V15 0.300 0.514
## V16 0.596 0.216 0.198 -0.146 0.417
## V17 0.785 0.153
## V18 0.555 0.210 0.240
## V19 0.117 0.288 0.482 0.178 0.136
## V20 -0.190 -0.158 0.587 0.138
## V21 -0.109 0.266 0.301 0.534 -0.131 0.218
## V22 0.141 0.109 0.592
## V23 0.709 0.236
## V24 0.274 -0.117 0.286 0.206 -0.404
## V25 0.578 -0.144
## V26 0.221 0.196 0.355 -0.133 0.447 0.217
## V27 0.225 0.334 0.141 0.273
## V28 0.198 -0.153 0.246 -0.325 0.418
## V29 -0.107 0.136 0.671
## V30 0.419 0.104 0.216 -0.239 0.325
##
## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
## SS loadings 3.828 2.708 2.623 2.448 1.193 0.971
## Proportion Var 0.128 0.090 0.087 0.082 0.040 0.032
## Cumulative Var 0.128 0.218 0.305 0.387 0.427 0.459
##
## Test of the hypothesis that 6 factors are sufficient.
## The chi square statistic is 550.68 on 270 degrees of freedom.
## The p-value is 0.00000000000000000000224
## [1] " Uniqueness as (1-$uniquenesses)"
## [1] " Loadings"
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
## V01 0.33 0.50
## V02 0.26 0.73
## V03 -0.68
## V04 0.53
## V05 0.59
## V06 0.70
## V07 0.45
## V08 0.25
## V09 0.53 0.28 0.35
## V10 0.31 0.47 0.26
## V11 0.72
## V12 0.31 0.36 0.50
## V13 0.34 0.48 0.27
## V14 0.77
## V15 0.30 0.51
## V16 0.60 0.42
## V17 0.79
## V18 0.56
## V19 0.29 0.48
## V20 0.59
## V21 0.27 0.30 0.53
## V22 0.59
## V23 0.71
## V24 0.27 0.29 -0.40
## V25 0.58
## V26 0.36 0.45
## V27 0.33 0.27
## V28 -0.33 0.42
## V29 0.67
## V30 0.42 0.33
##
## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
## SS loadings 3.83 2.71 2.62 2.45 1.19 0.97
## Proportion Var 0.13 0.09 0.09 0.08 0.04 0.03
## Cumulative Var 0.13 0.22 0.31 0.39 0.43 0.46
## [1] " Scores (Regression) ... saved ... "
Xs.self.EFA.fa = perform.EFA(Xs.self, 6, which="fa",
rotation="oblimin", scores = "regression", fa.fm="ml");
## [1] " KMO test has score: 0.903778669389676 --> marvelous!"
## [1] " Bartlett Test of Sphericity --> Bartlett's test of sphericity ... pvalue < alpha ... 0 < 0.05 ... \n CONCLUSION: we believe this data is likely suitable for factor analysis or PCA"
## [1] " Using psych::fa ... "
## [1] " Overview "
## Factor Analysis using method = ml
## Call: psych::fa(r = X, nfactors = n.factors, rotate = rotation, scores = scores,
## fm = fa.fm)
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML1 ML2 ML3 ML4 ML6 ML5 h2 u2 com
## V01 0.17 0.07 0.16 -0.13 0.14 0.55 0.47 0.53 1.7
## V02 0.14 0.15 0.75 -0.08 0.06 -0.04 0.63 0.37 1.2
## V03 0.16 0.22 -0.69 -0.11 0.21 -0.05 0.54 0.46 1.6
## V04 0.05 0.01 0.00 0.54 0.07 -0.08 0.32 0.68 1.1
## V05 0.49 -0.13 -0.02 0.14 0.20 0.04 0.49 0.51 1.7
## V06 0.76 0.04 -0.08 -0.07 -0.04 -0.01 0.49 0.51 1.1
## V07 -0.06 0.16 0.03 0.40 0.12 0.10 0.26 0.74 1.7
## V08 0.16 -0.12 -0.09 0.23 0.05 0.06 0.14 0.86 3.0
## V09 0.36 -0.13 0.06 0.15 0.33 0.11 0.54 0.46 2.9
## V10 0.16 0.03 0.42 0.15 0.14 0.08 0.43 0.57 2.0
## V11 0.00 0.73 -0.09 0.10 -0.10 0.12 0.61 0.39 1.2
## V12 0.03 0.10 0.10 0.21 0.02 0.55 0.51 0.49 1.5
## V13 -0.04 0.26 0.22 0.41 -0.03 0.18 0.45 0.55 2.8
## V14 0.07 0.00 0.73 0.01 0.10 0.15 0.72 0.28 1.1
## V15 0.22 0.04 -0.09 0.50 0.04 -0.05 0.36 0.64 1.5
## V16 0.37 -0.09 0.20 0.07 0.44 -0.06 0.64 0.36 2.6
## V17 0.80 -0.02 -0.02 0.03 0.00 0.07 0.67 0.33 1.0
## V18 0.53 -0.03 0.12 0.15 -0.03 0.04 0.42 0.58 1.3
## V19 0.02 0.18 0.18 0.41 0.05 0.13 0.39 0.61 2.1
## V20 -0.17 0.59 -0.05 -0.23 0.21 0.03 0.43 0.57 1.8
## V21 -0.05 0.52 0.16 0.24 -0.15 0.18 0.52 0.48 2.1
## V22 0.04 0.02 0.01 0.58 0.03 0.04 0.39 0.61 1.0
## V23 0.70 -0.02 0.18 0.00 -0.02 -0.06 0.57 0.43 1.1
## V24 0.10 0.04 -0.04 0.33 0.30 -0.42 0.38 0.62 2.9
## V25 0.06 0.60 0.03 -0.02 0.01 -0.21 0.36 0.64 1.3
## V26 -0.01 -0.12 0.09 0.22 0.43 0.28 0.48 0.52 2.6
## V27 0.06 0.09 0.31 0.03 0.29 0.06 0.27 0.73 2.4
## V28 0.01 -0.29 -0.21 0.19 0.42 0.13 0.41 0.59 3.1
## V29 -0.04 0.67 0.05 0.09 -0.04 -0.01 0.50 0.50 1.1
## V30 0.24 -0.20 0.07 0.14 0.33 0.01 0.40 0.60 3.1
##
## ML1 ML2 ML3 ML4 ML6 ML5
## SS loadings 3.34 2.52 2.53 2.37 1.65 1.36
## Proportion Var 0.11 0.08 0.08 0.08 0.05 0.05
## Cumulative Var 0.11 0.20 0.28 0.36 0.41 0.46
## Proportion Explained 0.24 0.18 0.18 0.17 0.12 0.10
## Cumulative Proportion 0.24 0.43 0.61 0.78 0.90 1.00
##
## With factor correlations of
## ML1 ML2 ML3 ML4 ML6 ML5
## ML1 1.00 -0.17 0.30 0.32 0.48 0.06
## ML2 -0.17 1.00 0.10 0.05 -0.16 0.13
## ML3 0.30 0.10 1.00 0.26 0.07 0.42
## ML4 0.32 0.05 0.26 1.00 0.27 0.27
## ML6 0.48 -0.16 0.07 0.27 1.00 0.11
## ML5 0.06 0.13 0.42 0.27 0.11 1.00
##
## Mean item complexity = 1.9
## Test of the hypothesis that 6 factors are sufficient.
##
## The degrees of freedom for the null model are 435 and the objective function was 11.28 with Chi Square of 7515.56
## The degrees of freedom for the model are 270 and the objective function was 0.83
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic number of observations is 678 with the empirical chi square 369.19 with prob < 0.000056
## The total number of observations was 678 with Likelihood Chi Square = 550.68 with prob < 0.0000000000000000000022
##
## Tucker Lewis Index of factoring reliability = 0.936
## RMSEA index = 0.039 and the 90 % confidence intervals are 0.034 0.044
## BIC = -1209.49
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## ML1 ML2 ML3 ML4 ML6 ML5
## Correlation of (regression) scores with factors 0.93 0.91 0.92 0.88 0.85 0.84
## Multiple R square of scores with factors 0.87 0.83 0.85 0.77 0.72 0.70
## Minimum correlation of possible factor scores 0.74 0.65 0.71 0.53 0.43 0.40
## [1] " Uniqueness as (1-$uniquenesses)"
## [1] " Loadings"
##
## Loadings:
## ML1 ML2 ML3 ML4 ML6 ML5
## V01 0.55
## V02 0.75
## V03 -0.69
## V04 0.54
## V05 0.49
## V06 0.76
## V07 0.40
## V08
## V09 0.36 0.33
## V10 0.42
## V11 0.73
## V12 0.55
## V13 0.26 0.41
## V14 0.73
## V15 0.50
## V16 0.37 0.44
## V17 0.80
## V18 0.53
## V19 0.41
## V20 0.59
## V21 0.52
## V22 0.58
## V23 0.70
## V24 0.33 0.30 -0.42
## V25 0.60
## V26 0.43 0.28
## V27 0.31 0.29
## V28 -0.29 0.42
## V29 0.67
## V30 0.33
##
## ML1 ML2 ML3 ML4 ML6 ML5
## SS loadings 2.80 2.39 2.18 1.96 1.19 1.10
## Proportion Var 0.09 0.08 0.07 0.07 0.04 0.04
## Cumulative Var 0.09 0.17 0.25 0.31 0.35 0.39
## [1] " Scores (Regression) ... saved ... "
## [1] " CFI "
## [1] 0.9603591
## [1] " TLI "
## [1] 0.9357244
–DO SOMETHING HERE–
Xs.other = Xs[,31:60];
Xs.other;
library(devtools);
source_url("https://raw.githubusercontent.com/MonteShaffer/humanVerseWSU/master/humanVerseWSU/R/functions-EDA.R");
Xs.other.how.many = howManyFactorsToSelect(Xs.other);
## [1] " Paralell Analysis"
## Parallel analysis suggests that the number of factors = 5 and the number of components = NA
## [1] "============================================="
## [1] " VSS Analysis"
## [1] "************************"
## [1] " Eigenvalues >= 1 ... [ n = 5 ]"
## [1] 9.608414 3.446460 1.352283 1.178298 1.066263
## [1] "************************"
## [1] "A 3-Factor solution has the most votes!"
## [1] ""
## [1] "Due to Optimal Coordinantes and Parallel Analysis Agreement,"
## [1] "A 3-Factor solution is *strongly* recommended!"
## [1] "************************"
## [1] " Final Analysis of VSS, Eigen, nFactors"
## Factor vote.count
## 1 1 2
## 2 2 1
## 3 3 3
## 4 4 1
## 5 5 2
## [1] ""
Xs.other.EFA.factanal = perform.EFA(Xs.other, 6, which="factanal",
rotation="varimax", scores = "regression");
## [1] " KMO test has score: 0.946434514031066 --> marvelous!"
## [1] " Bartlett Test of Sphericity --> Bartlett's test of sphericity ... pvalue < alpha ... 0 < 0.05 ... \n CONCLUSION: we believe this data is likely suitable for factor analysis or PCA"
## [1] " Using stats::factanal ... "
## [1] " Overview "
##
## Call:
## stats::factanal(x = X, factors = n.factors, scores = scores, rotation = rotation)
##
## Uniquenesses:
## V31 V32 V33 V34 V35 V36 V37 V38 V39 V40 V41 V42 V43
## 0.457 0.613 0.558 0.460 0.463 0.481 0.527 0.532 0.522 0.515 0.588 0.559 0.566
## V44 V45 V46 V47 V48 V49 V50 V51 V52 V53 V54 V55 V56
## 0.586 0.493 0.540 0.472 0.754 0.486 0.565 0.324 0.606 0.353 0.426 0.753 0.532
## V57 V58 V59 V60
## 0.457 0.504 0.457 0.414
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
## V31 0.138 0.690 0.210
## V32 0.211 0.263 0.498 0.104 0.119
## V33 0.545 0.362
## V34 0.164 0.510 0.398 0.224 0.205
## V35 0.296 0.660
## V36 0.438 0.439 0.361
## V37 -0.224 0.601 0.146 -0.137 0.109 0.101
## V38 0.227 0.570 0.174 0.239
## V39 0.454 0.161 0.483
## V40 0.275 0.457 0.404 0.182
## V41 0.558 0.157 0.205 0.122
## V42 0.245 0.522 0.210 0.243
## V43 0.344 0.378 0.408
## V44 0.321 0.497 0.112 0.222
## V45 0.641 0.246 0.138 0.107
## V46 0.642 0.137 0.115 0.120
## V47 0.674 0.165 0.210
## V48 0.291 0.241 0.289
## V49 0.152 0.577 0.360 0.138
## V50 0.141 0.319 0.521 0.180
## V51 0.164 0.370 0.197 0.681
## V52 0.325 0.520 0.101
## V53 0.758 0.189 0.168
## V54 0.611 0.245 0.206 0.211 -0.216
## V55 0.388 0.135 0.271
## V56 0.639 0.184 0.141
## V57 0.335 0.337 0.468 0.158 0.269
## V58 0.634 0.196 0.200
## V59 0.724 0.105
## V60 0.720 0.146 0.184
##
## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
## SS loadings 5.344 4.700 1.802 0.985 0.880 0.727
## Proportion Var 0.178 0.157 0.060 0.033 0.029 0.024
## Cumulative Var 0.178 0.335 0.395 0.428 0.457 0.481
##
## Test of the hypothesis that 6 factors are sufficient.
## The chi square statistic is 466.51 on 270 degrees of freedom.
## The p-value is 0.00000000000113
## [1] " Uniqueness as (1-$uniquenesses)"
## [1] " Loadings"
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
## V31 0.69
## V32 0.26 0.50
## V33 0.54 0.36
## V34 0.51 0.40
## V35 0.30 0.66
## V36 0.44 0.44 0.36
## V37 0.60
## V38 0.57
## V39 0.45 0.48
## V40 0.27 0.46 0.40
## V41 0.56
## V42 0.52
## V43 0.34 0.38 0.41
## V44 0.32 0.50
## V45 0.64
## V46 0.64
## V47 0.67
## V48 0.29 0.29
## V49 0.58 0.36
## V50 0.32 0.52
## V51 0.37 0.68
## V52 0.33 0.52
## V53 0.76
## V54 0.61
## V55 0.39 0.27
## V56 0.64
## V57 0.34 0.34 0.47 0.27
## V58 0.63
## V59 0.72
## V60 0.72
##
## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
## SS loadings 5.34 4.70 1.80 0.98 0.88 0.73
## Proportion Var 0.18 0.16 0.06 0.03 0.03 0.02
## Cumulative Var 0.18 0.33 0.39 0.43 0.46 0.48
## [1] " Scores (Regression) ... saved ... "
Xs.other.EFA.fa = perform.EFA(Xs.other, 6, which="fa",
rotation="oblimin", scores = "regression", fa.fm="ml");
## [1] " KMO test has score: 0.946434514031066 --> marvelous!"
## [1] " Bartlett Test of Sphericity --> Bartlett's test of sphericity ... pvalue < alpha ... 0 < 0.05 ... \n CONCLUSION: we believe this data is likely suitable for factor analysis or PCA"
## [1] " Using psych::fa ... "
## [1] " Overview "
## Factor Analysis using method = ml
## Call: psych::fa(r = X, nfactors = n.factors, rotate = rotation, scores = scores,
## fm = fa.fm)
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML1 ML4 ML2 ML6 ML5 ML3 h2 u2 com
## V31 0.07 0.05 0.56 0.03 0.04 0.25 0.54 0.46 1.5
## V32 0.04 0.61 -0.04 0.04 -0.09 0.10 0.39 0.61 1.1
## V33 0.29 0.01 -0.02 0.45 -0.02 0.05 0.44 0.56 1.8
## V34 -0.11 0.44 0.16 0.21 0.24 -0.02 0.54 0.46 2.6
## V35 0.03 -0.01 0.06 0.05 -0.01 0.71 0.54 0.46 1.0
## V36 0.15 -0.01 0.29 0.46 0.09 0.07 0.52 0.48 2.1
## V37 -0.21 0.18 0.43 -0.19 0.16 0.14 0.47 0.53 3.0
## V38 -0.02 0.19 0.38 0.29 0.10 0.01 0.47 0.53 2.6
## V39 0.09 0.02 -0.02 0.59 0.10 0.06 0.48 0.52 1.1
## V40 0.05 0.47 0.17 0.16 0.05 0.03 0.49 0.51 1.6
## V41 -0.11 0.15 0.34 -0.15 0.27 0.16 0.41 0.59 3.6
## V42 0.11 0.16 0.27 0.06 0.30 0.04 0.44 0.56 3.0
## V43 0.25 0.46 0.13 -0.04 0.01 0.04 0.43 0.57 1.7
## V44 0.24 0.04 0.32 0.03 0.28 0.02 0.41 0.59 2.9
## V45 0.56 0.11 0.20 0.13 -0.08 -0.07 0.51 0.49 1.6
## V46 0.68 0.05 0.07 -0.07 -0.05 0.09 0.46 0.54 1.1
## V47 0.52 0.12 -0.07 0.24 -0.05 -0.01 0.53 0.47 1.6
## V48 0.06 0.24 -0.31 0.30 0.04 0.08 0.25 0.75 3.2
## V49 -0.05 0.43 0.29 0.12 0.06 0.11 0.51 0.49 2.2
## V50 0.09 0.59 0.00 -0.21 0.15 -0.05 0.43 0.57 1.5
## V51 0.03 0.01 -0.04 0.02 0.82 -0.01 0.68 0.32 1.0
## V52 0.22 0.07 0.47 0.14 -0.06 0.02 0.39 0.61 1.7
## V53 0.63 0.11 -0.10 0.17 0.05 -0.02 0.65 0.35 1.3
## V54 0.43 0.18 0.14 0.24 0.07 -0.24 0.57 0.43 3.0
## V55 0.35 0.10 -0.10 0.03 -0.04 0.26 0.25 0.75 2.3
## V56 0.68 -0.06 0.13 -0.07 0.14 -0.04 0.47 0.53 1.2
## V57 0.12 0.47 -0.05 0.10 0.27 -0.01 0.54 0.46 1.9
## V58 0.56 -0.05 0.06 0.10 0.22 -0.01 0.50 0.50 1.4
## V59 -0.03 0.13 0.59 0.04 0.08 0.08 0.54 0.46 1.2
## V60 0.69 0.05 -0.21 0.05 0.01 0.15 0.59 0.41 1.3
##
## ML1 ML4 ML2 ML6 ML5 ML3
## SS loadings 4.14 2.95 2.61 1.99 1.78 0.99
## Proportion Var 0.14 0.10 0.09 0.07 0.06 0.03
## Cumulative Var 0.14 0.24 0.32 0.39 0.45 0.48
## Proportion Explained 0.29 0.20 0.18 0.14 0.12 0.07
## Cumulative Proportion 0.29 0.49 0.67 0.81 0.93 1.00
##
## With factor correlations of
## ML1 ML4 ML2 ML6 ML5 ML3
## ML1 1.00 0.40 0.09 0.60 0.25 0.12
## ML4 0.40 1.00 0.45 0.35 0.53 0.24
## ML2 0.09 0.45 1.00 0.15 0.42 0.29
## ML6 0.60 0.35 0.15 1.00 0.20 0.08
## ML5 0.25 0.53 0.42 0.20 1.00 0.09
## ML3 0.12 0.24 0.29 0.08 0.09 1.00
##
## Mean item complexity = 1.9
## Test of the hypothesis that 6 factors are sufficient.
##
## The degrees of freedom for the null model are 435 and the objective function was 12.75 with Chi Square of 8492.56
## The degrees of freedom for the model are 270 and the objective function was 0.7
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic number of observations is 678 with the empirical chi square 270.69 with prob < 0.48
## The total number of observations was 678 with Likelihood Chi Square = 466.51 with prob < 0.0000000000011
##
## Tucker Lewis Index of factoring reliability = 0.96
## RMSEA index = 0.033 and the 90 % confidence intervals are 0.028 0.038
## BIC = -1293.66
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## ML1 ML4 ML2 ML6 ML5 ML3
## Correlation of (regression) scores with factors 0.94 0.90 0.90 0.87 0.88 0.80
## Multiple R square of scores with factors 0.88 0.82 0.80 0.75 0.78 0.64
## Minimum correlation of possible factor scores 0.77 0.63 0.61 0.51 0.56 0.29
## [1] " Uniqueness as (1-$uniquenesses)"
## [1] " Loadings"
##
## Loadings:
## ML1 ML4 ML2 ML6 ML5 ML3
## V31 0.56 0.25
## V32 0.61
## V33 0.29 0.45
## V34 0.44
## V35 0.71
## V36 0.29 0.46
## V37 0.43
## V38 0.38 0.29
## V39 0.59
## V40 0.47
## V41 0.34 0.27
## V42 0.27 0.30
## V43 0.46
## V44 0.32 0.28
## V45 0.56
## V46 0.68
## V47 0.52
## V48 -0.31 0.30
## V49 0.43 0.29
## V50 0.59
## V51 0.82
## V52 0.47
## V53 0.63
## V54 0.43
## V55 0.35 0.26
## V56 0.68
## V57 0.47 0.27
## V58 0.56
## V59 0.59
## V60 0.69
##
## ML1 ML4 ML2 ML6 ML5 ML3
## SS loadings 3.40 2.06 2.00 1.34 1.23 0.83
## Proportion Var 0.11 0.07 0.07 0.04 0.04 0.03
## Cumulative Var 0.11 0.18 0.25 0.29 0.33 0.36
## [1] " Scores (Regression) ... saved ... "
## [1] " CFI "
## [1] 0.9756119
## [1] " TLI "
## [1] 0.9604578